Coupling a generative model with a discriminative learning framework for speaker verification

نویسندگان

چکیده

The task of speaker verification (SV) is to decide whether an utterance spoken by a target or imposter speaker. In most studies SV, log-likelihood ratio (LLR) score estimated based on generative probability model features, and compared with threshold for making decision. However, the usually focuses individual feature distributions, does not have discriminative selection ability, easy be distracted nuisance features. as hypothesis test, could formulated binary discrimination where neural network learning applied. learning, features removed help label supervision. pays more attention classification boundaries, prone overfitting training set which may result in bad generalization test set. this paper, we propose hybrid framework, i.e., coupling joint Bayesian (JB) structure parameters framework SV. two-branch Siamese built dense layers that are coupled factorized affine transforms used JB model. LLR estimation according distance metric framework. By initializing generatively learned model, further train pairwise samples task. Moreover, direct evaluation (DEM) SV minimum empirical Bayes risk (EBR) designed integrated objective function learning. We carried out experiments Speakers wild (SITW) Voxceleb. Experimental results showed our proposed improved performance large margin state art models

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DETAC: a discriminative criterion for speaker verification

This paper introduces a general criterion applicable to discriminative training of detection systems, and discusses its particular implementation in GMM-based text-independent speaker verification. Based on an analysis of the detection error trade-off curve of a baseline system, we argue that the new criterion extends several conventional methods such as the maximum posterior training by logist...

متن کامل

Discriminative adaptation for speaker verification

This paper describes a speaker verification system in which the talker and imposter models are adapted to achieve maximum discrimination, or equivalently minimum verification error. This goal is accomplishedby extending the minimum error classificationcriterion (MCE) and generalized probabilistic descent (GPD) algorithm to the task of adapting talker model parameters and the corresponding anti-...

متن کامل

Discriminative adaptation for speaker verification

Speaker verification is a binary classification task to determine whether a claimed speaker uttered a phrase. Current approaches to speaker verification tasks typically involve adapting a general speaker Universal Background Model (UBM), normally a Gaussian Mixture Model (GMM), to model a particular speaker. Verification is then performed by comparing the likelihoods from the speaker model to t...

متن کامل

Large Margin GMM for discriminative speaker verification

Gaussian mixture models (GMM), trained using the generative criterion of maximum likelihood estimation, have been the most popular approach in speaker recognition during the last decades. This approach is also widely used in many other classification tasks and applications. Generative learning in not however the optimal way to address classification problems. In this paper we first present a ne...

متن کامل

A Robust Framework for Forensic Speaker Verification

This paper discusses the application of automatic speaker verification systems in forensic casework. A framework for reporting the system outcome is proposed. Specific system requirements to properly cope with forensic idiosyncrasies are analyzed through a series of simulations. Results suggest that the design of a forensic speaker verification system not necessarily match the settings of curre...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3129360